{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 金融 证券 图谱搭建"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 图数据库设置"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-28T05:48:19.517559Z",
     "start_time": "2020-01-28T05:48:19.514553Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from py2neo import Graph\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def check_result(result):\n",
    "    for c in result :\n",
    "        print(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "graph = Graph(\n",
    "    \"bolt://localhost:7687\",\n",
    "    usernmae=\"neo4j\",\n",
    "    password=\"Neo4j\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1创建索引\n",
    "\n",
    "提高查询效率"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "index_scripts = \"\"\" CREATE INDEX ON :CITY(city);\n",
    "                    CREATE INDEX ON :FUND(fund_code);\n",
    "                    CREATE INDEX ON :COMPANY(company);\n",
    "                    CREATE INDEX ON :FUND_CUSTODIAN(company);\n",
    "                    CREATE INDEX ON :FUND_MANAGER(company);\n",
    "                    CREATE INDEX ON :LISTED_COMPANY(company);\n",
    "                    CREATE INDEX ON :INDUSTRY(industry);\n",
    "                    CREATE INDEX ON :MANAGER(manager_id);\n",
    "                    CREATE INDEX ON :PROVINCE(province);\n",
    "                    CREATE INDEX ON :LISTED_COMPANY(share_code);\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CREATE INDEX ON :CITY(city);\n",
      "CREATE INDEX ON :FUND(fund_code);\n",
      "CREATE INDEX ON :COMPANY(company);\n",
      "CREATE INDEX ON :FUND_CUSTODIAN(company);\n",
      "CREATE INDEX ON :FUND_MANAGER(company);\n",
      "CREATE INDEX ON :LISTED_COMPANY(company);\n",
      "CREATE INDEX ON :INDUSTRY(industry);\n",
      "CREATE INDEX ON :MANAGER(manager_id);\n",
      "CREATE INDEX ON :PROVINCE(province);\n",
      "CREATE INDEX ON :LISTED_COMPANY(share_code);\n"
     ]
    }
   ],
   "source": [
    "for scripts in index_scripts.split('\\n'):\n",
    "    scripts = scripts.strip()\n",
    "    print(scripts)\n",
    "    result = graph.run(scripts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2. 安装插件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "主要是 apoc 插件。这里使用的是 `apoc-3.4.0.3-all.jar `\n",
    "\n",
    "具体插件安装方式 可以参考 公众号 `知行并重`中的对应文章，这里就不做展开了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 知识图谱应用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面是应用部分。主要从两个方面展开\n",
    "1. 可视化\n",
    "\n",
    "利用 neo4j 自带的可视化插件进行\n",
    "\n",
    "2. 关联查询\n",
    "\n",
    "利用 neo4j 的查询语言 `Cypher` 进行相关查询。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1.可视化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "关系图谱将数据以关联方式呈现，更直观。正所谓一图胜千言"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里举个例子：\n",
    "**查看山东国资背景的上市公司，在城市和行业上的分布情况。**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "国资背景上市公司\n",
    "https://zhuanlan.zhihu.com/p/22894178"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "山东共有省属企业23家，资产总额12906亿元。其中部分上市公司的有：\n",
    "- 山东钢铁（600022）\n",
    "- 山东黄金（600547）\n",
    "- 鲁商置业（600223）\n",
    "- 银座股份（600858）\n",
    "- 山东高速（600350）\n",
    "- 山东路桥（000498）\n",
    "- 潍柴动力（000338）\n",
    "- 浪潮集团（000977）\n",
    "- 浪潮软件（600756）\n",
    "- 鲁银投资（600784）\n",
    "- 新华制药（000756）\n",
    "- 鲁抗医药（600789）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"\"\"\n",
    "        MATCH p1 = (a:LISTED_COMPANY)-[:IN_INDUSTRY]-(b:INDUSTRY) \n",
    "            WHERE a.share_code in [\"600022\",\"600547\",\"600223\",\"600858\",\"600350\",\"000498\",\"000338\",\"000977\",\"600756\",\"600784\",\"000756\",\"600789\"] \n",
    "        MATCH p2 = (a)-[:IN_CITY]-(c:CITY)-[:IN_PROVINCE]-(d:PROVINCE) \n",
    "        return p1,p2\n",
    "        \"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在 console 中运行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "# out = graph.run(query).to_table()\n",
    "# out"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可视化后的结果如下："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](../pictures/shandong_listed_companies.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.图查询"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "关系图谱的一个很重要的应用就是为上层应用和服务提供图查询结构，并返回对应关联数据。\n",
    "比如推荐系统、以及风控场景下的关联排查等\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面以本次的金融图谱举几个例子"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.1. 江苏省 最受 基金 欢迎（持仓最多）的上市公司\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "query1 = \"\"\"\n",
    "MATCH p = (a:FUND)-[:IN_PORTFOLIO]->(b:LISTED_COMPANY)-[:IN_CITY]-(c:CITY)-[:IN_PROVINCE]->(d:PROVINCE {province:\"江苏\"})  \n",
    "RETURN c.city as city,b.company as company,count(DISTINCT a.fund_code) as num\n",
    "ORDER BY num DESC\n",
    "limit 10\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>company</th>\n",
       "      <th>num</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>连云港市</td>\n",
       "      <td>江苏恒瑞医药股份有限公司</td>\n",
       "      <td>2887</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>无锡市</td>\n",
       "      <td>江苏卓胜微电子股份有限公司</td>\n",
       "      <td>2644</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>南京市</td>\n",
       "      <td>华泰证券股份有限公司</td>\n",
       "      <td>2472</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>宿迁市</td>\n",
       "      <td>江苏洋河酒厂股份有限公司</td>\n",
       "      <td>2101</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>常州市</td>\n",
       "      <td>常州朗博密封科技股份有限公司</td>\n",
       "      <td>2003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>常州市</td>\n",
       "      <td>科华控股股份有限公司</td>\n",
       "      <td>1997</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>常州市</td>\n",
       "      <td>江苏日盈电子股份有限公司</td>\n",
       "      <td>1992</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>苏州市</td>\n",
       "      <td>苏州赛腾精密电子股份有限公司</td>\n",
       "      <td>1983</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>南京市</td>\n",
       "      <td>南京银行股份有限公司</td>\n",
       "      <td>1953</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>苏州市</td>\n",
       "      <td>亚翔系统集成科技(苏州)股份有限公司</td>\n",
       "      <td>1806</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   city             company   num\n",
       "0  连云港市        江苏恒瑞医药股份有限公司  2887\n",
       "1   无锡市       江苏卓胜微电子股份有限公司  2644\n",
       "2   南京市          华泰证券股份有限公司  2472\n",
       "3   宿迁市        江苏洋河酒厂股份有限公司  2101\n",
       "4   常州市      常州朗博密封科技股份有限公司  2003\n",
       "5   常州市          科华控股股份有限公司  1997\n",
       "6   常州市        江苏日盈电子股份有限公司  1992\n",
       "7   苏州市      苏州赛腾精密电子股份有限公司  1983\n",
       "8   南京市          南京银行股份有限公司  1953\n",
       "9   苏州市  亚翔系统集成科技(苏州)股份有限公司  1806"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = graph.run(query1).to_data_frame()\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2.2. 上市公司高管董事之间的兼任情况"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "query2 = \"\"\"\n",
    "MATCH  (b:COMPANY)-[r1:其他|监事|高管|委员会成员|董事会成员|核心技术人员]->(d:MANAGER)\n",
    "MATCH  (c:COMPANY)-[r2:其他|监事|高管|委员会成员|董事会成员|核心技术人员]->(d)\n",
    "    WHERE  b.company<>c.company  \n",
    "RETURN b.company as company_s,\n",
    "       b.share_code as share_code_s ,\n",
    "       r1.title as title_s,\n",
    "       type(r1) as rel_s,\n",
    "       c.company as company_d,\n",
    "       c.share_code as share_code_d,\n",
    "       d.manager_id as manager_id,\n",
    "       r2.title as title_d,\n",
    "       type(r2) as rel_d,\n",
    "       d.name as name \n",
    "LIMIT 10 ;\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](../pictures/same_manager.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = graph.run(query2).to_data_frame()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>company_d</th>\n",
       "      <td>平安银行股份有限公司</td>\n",
       "      <td>平安银行股份有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>company_s</th>\n",
       "      <td>中国民生银行股份有限公司</td>\n",
       "      <td>中国民生银行股份有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>manager_id</th>\n",
       "      <td>31d540b750077fd402be2e0bcb5a4980</td>\n",
       "      <td>31d540b750077fd402be2e0bcb5a4980</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <td>邵平</td>\n",
       "      <td>邵平</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rel_d</th>\n",
       "      <td>其他</td>\n",
       "      <td>其他</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rel_s</th>\n",
       "      <td>高管</td>\n",
       "      <td>高管</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>share_code_d</th>\n",
       "      <td>000001</td>\n",
       "      <td>000001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>share_code_s</th>\n",
       "      <td>600016</td>\n",
       "      <td>600016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>title_d</th>\n",
       "      <td>战略委员会主席</td>\n",
       "      <td>战略委员会委员</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>title_s</th>\n",
       "      <td>副行长</td>\n",
       "      <td>副行长</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             0  \\\n",
       "company_d                           平安银行股份有限公司   \n",
       "company_s                         中国民生银行股份有限公司   \n",
       "manager_id    31d540b750077fd402be2e0bcb5a4980   \n",
       "name                                        邵平   \n",
       "rel_d                                       其他   \n",
       "rel_s                                       高管   \n",
       "share_code_d                            000001   \n",
       "share_code_s                            600016   \n",
       "title_d                                战略委员会主席   \n",
       "title_s                                    副行长   \n",
       "\n",
       "                                             1  \n",
       "company_d                           平安银行股份有限公司  \n",
       "company_s                         中国民生银行股份有限公司  \n",
       "manager_id    31d540b750077fd402be2e0bcb5a4980  \n",
       "name                                        邵平  \n",
       "rel_d                                       其他  \n",
       "rel_s                                       高管  \n",
       "share_code_d                            000001  \n",
       "share_code_s                            600016  \n",
       "title_d                                战略委员会委员  \n",
       "title_s                                    副行长  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(2).T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2.3.基金推荐"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "写个简单的基金推荐规则\n",
    "\n",
    "根据某个基金持仓在不同行业的分布情况，基于此推荐与其持仓行业最相似的 Top5 基金。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "fund_code = \"001508\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "query3 = \"\"\"\n",
    "MATCH p = (a:FUND)-[:IN_PORTFOLIO]->(b:COMPANY)-[:IN_INDUSTRY]-(c:INDUSTRY)  \n",
    "    WHERE a.fund_code=\"{}\"   \n",
    "WITH  COLLECT(DISTINCT c.industry) as industry_base  \n",
    "MATCH  (d:FUND)-[:IN_PORTFOLIO]->(e:COMPANY)-[:IN_INDUSTRY]-(f:INDUSTRY) \n",
    "WITH d.fund_code as fund_code,d.name as name, industry_base,collect(DISTINCT f.industry) as industry_target \n",
    "RETURN fund_code,name,\n",
    "       toFloat(length([ind in industry_target where ind in industry_base ]))/length(industry_target) as ration\n",
    "ORDER BY ration DESC\n",
    "LIMIT 5\n",
    "\"\"\".format(fund_code)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = graph.run(query3).to_data_frame()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>fund_code</th>\n",
       "      <th>name</th>\n",
       "      <th>ration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>005454</td>\n",
       "      <td>前海开源医疗健康C</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>159977</td>\n",
       "      <td>创业板TH</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>006254</td>\n",
       "      <td>长城久悦</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>004598</td>\n",
       "      <td>南方中证银行ETF联接C</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>005304</td>\n",
       "      <td>嘉实医药健康C</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  fund_code          name  ration\n",
       "0    005454     前海开源医疗健康C     1.0\n",
       "1    159977         创业板TH     1.0\n",
       "2    006254          长城久悦     1.0\n",
       "3    004598  南方中证银行ETF联接C     1.0\n",
       "4    005304       嘉实医药健康C     1.0"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
